74 ◾ Bioinformatics
The size of this SAM file is about 19G. We can convert it to the BAM format using Samtools
to save some storage space. First, you need to change to the “sam” directory and then run
the following:
samtools view \
-uS \
-o SRR769545_mem.bam \
SRR769545_mem.sam
The new BAM file is about 15G. We can then delete the SAM file as follows to save some
storage space:
rm SRR769545_mem.sam
BWA-MEM2 is an optimized BWA-MEM algorithm that has been recently released. This
new version produces alignments identical to BWA-MEM but it is faster and the indexing
occupies less storage space and memory [18]. You can install BWA-MEM2 separately by
following the instructions available at “https://github.com/bwa-mem2/bwa-mem2”.
2-BWA-SW
The BWA-SW [9] algorithm, like BWA-MEM, can also be used for the alignment of
single- and paired-end long reads generated by all platforms. It uses SW local alignment
approach to map reads to a reference genome. BWA-SW has a better sensitivity when
alignments have frequent gaps. However, this algorithm has been depreciated by his devel-
oper since BWA-MEM is restructured for better performance. The following “bwa bwasw”
performs read alignment as above:
bwa bwasw \
-t 4 \
refgenome/GRCh38.p13_ref.fna \
data/SRR769545_1.fastq.gz \
data/SRR769545_2.fastq.gz \
> sam/SRR769545_bwasw.sam 2> sam/SRR769545_bwasw.log
You can convert this SAM file to BAM file as we did above or you can just delete it to save
some space.
3-BWA-backtrack
The BWA-backtrack algorithm is designed for aligning Illumina short reads of a length
up to 100 bp with sequencing error rates below 2%. It involves two steps: (i) using “bwa
aln” to find the coordinates of the positions, where the short reads align, on the refer-
ence genome, and then (ii) generating alignments with “bwa samse” for single-end reads
or “bwa sampe” for paired-end reads. The base call quality usually deteriorates toward
the end of reads generated by Illumina instruments. This algorithm optionally trims low-
quality bases from the 3′-end of the short reads before alignment. Therefore, it is able to
align more reads with high error rate toward the 3′-ends of the reads.